Add Z Image LoRA fine tuning support by ParamThakkar123 · Pull Request #1127 · transformerlab/transformerlab-app

ParamThakkar123 · 2025-12-23T06:09:28Z

Summary by CodeRabbit

New Features
- Z-Image pipeline: end-to-end training & inference, prompt encoding, dataset/collation support, LoRA compatibility, and tailored save/load behavior.
- Smarter model handling: local vs remote model resolution with runtime fallback and filtering of unsupported generation options.
Chores
- Plugin bumped to 0.1.11.
- Installer made more robust with clearer install flow and graceful handling of optional components.
Refactor
- Startup/config flow streamlined to prefer plugin-provided runtime libraries and configurable env var sourcing.
Tests
- Added tests for model resolution and pipeline kwargs filtering.

codecov-commenter · 2025-12-23T06:12:00Z

Codecov Report

❌ Patch coverage is 1.04167% with 190 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
api/transformerlab/plugins/image_diffusion/main.py	1.56%	126 Missing ⚠️
api/transformerlab/plugin_sdk/plugin_harness.py	0.00%	64 Missing ⚠️

📢 Thoughts on this report? Let us know!

ParamThakkar123 · 2025-12-24T06:17:16Z

I am getting this error while running fine tuning task with any diffusion model on the diffusion trainer plugin:

terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_create

I tried fine tuning SDXL and other stable diffusion models but got this error on every run

…ab-app into add/z-image-ft

deep1401 · 2025-12-24T16:34:51Z

I am getting this error while running fine tuning task with any diffusion model on the diffusion trainer plugin:
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_create
I tried fine tuning SDXL and other stable diffusion models but got this error on every run

I had this error once, updating timm resolved it. But it may or may not help in your case

dadmobile

I wasn't able to get this to running. Let's spend some time this week syncing on what's required and then we can do a patch release and announce this!

When trying to generate with Z Image Turbo I kept getting some vague error I will have to debug.

When trying to run the train I would get:

Error in Job: 'FlowMatchEulerDiscreteScheduler' object has no attribute 'add_noise'
Traceback (most recent call last):
  File "/home/azureuser/transformerlab-app/api/transformerlab/plugin_sdk/transformerlab/sdk/v1/tlab_plugin.py", line 105, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/azureuser/.transformerlab/orgs/3c33c85b-628a-4ca8-93d3-b657cb7973b2/workspace/plugins/diffusion_trainer/main.py", line 818, in train_diffusion_lora
    noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/azureuser/.transformerlab/orgs/3c33c85b-628a-4ca8-93d3-b657cb7973b2/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/diffusers/configuration_utils.py", line 144, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'FlowMatchEulerDiscreteScheduler' object has no attribute 'add_noise'

deep1401

I think we've let this sit long enough that we're getting version errors now. Running this I get errors like these which you might want to look at:

Using default home directory: /home/transformerlab/.transformerlab
Error executing plugin: Could not import module 'BloomPreTrainedModel'. Are this object's requirements defined correctly?
Traceback (most recent call last):
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2317, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2347, in _get_module
    raise e
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2345, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/envs/transformerlab/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/models/bloom/modeling_bloom.py", line 29, in <module>
    from ...modeling_layers import GradientCheckpointingLayer
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/modeling_layers.py", line 28, in <module>
    from .processing_utils import Unpack
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/processing_utils.py", line 37, in <module>
    from .image_utils import ChannelDimension, ImageInput, is_vision_available
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/image_utils.py", line 55, in <module>
    from torchvision.transforms import InterpolationMode
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torchvision/__init__.py", line 10, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torchvision/_meta_registrations.py", line 163, in <module>
    @torch.library.register_fake("torchvision::nms")
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/library.py", line 1073, in register
    use_lib._register_fake(
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/library.py", line 203, in _register_fake
    handle = entry.fake_impl.register(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/_library/fake_impl.py", line 50, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator torchvision::nms does not exist

api/transformerlab/plugins/diffusion_trainer/index.json

…ansformerlab-app into add/z-image-ft

…ab-app into add/z-image-ft

…ansformerlab-app into add/z-image-ft

coderabbitai · 2026-02-18T08:19:51Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Z-Image pipeline support across training and inference (loading, tokenizer/prompt encoding, FlowMatchSFTLoss, LoRA handling and save paths), bumps diffusion_trainer plugin version, expands installer script, adds model-reference resolution and generation-kwargs filtering for image pipelines, and introduces runtime/config helpers and related tests.

Changes

Cohort / File(s)	Summary
Plugin Manifest `api/transformerlab/plugins/diffusion_trainer/index.json`	Bumped version 0.1.10 → 0.1.11 and added `ZImagePipeline` to `model_architectures`.
Diffusion Trainer (Z-Image) `api/transformerlab/plugins/diffusion_trainer/main.py`	Added `build_zimage_model_configs()` and `encode_prompt_zimage()`; detect/instantiate `ZImagePipeline`; Z-Image-specific model/tokenizer loading, device/dtype and selective parameter freezing, FlowMatchSFTLoss integration, LoRA/PEFT handling for Z-Image, dataset collation/preprocessing updates (prompts, original_sizes, crops), and expanded LoRA save logic (safetensors/PyTorch fallback, async adaptor-info write).
Installer Script `api/transformerlab/plugins/diffusion_trainer/setup.sh`	Reworked install script: shebang, combined dependency install (diffusers, transformers, peft, diffsynth), improved xformers guard and non-fatal error handling, clearer messaging and no-fail optional installs.
Image Diffusion — Model Resolution & Kwarg Filtering `api/transformerlab/plugins/image_diffusion/main.py`, `api/transformerlab/plugins/image_diffusion/diffusion_worker.py`	Added `_is_probable_hf_repo_id`, `_extract_hf_repo_from_model_metadata`, `resolve_diffusion_model_reference` to prefer local model dirs or fall back to HF repo ids; integrated resolution into pipeline loading/sharding/device-map; added `filter_generation_kwargs_for_pipeline` and applied it before pipeline invocation.
Plugin Harness / Runtime Config `api/transformerlab/plugin_sdk/plugin_harness.py`	Added `get_db_config_value()` for SQLite-config lookups, `configure_plugin_runtime_library_paths()` to prefer plugin venv CUDA/NCCL libs via LD_LIBRARY_PATH; made `set_config_env_vars()` params optional and invoked runtime path configuration on startup.
Tests `api/test/api/test_diffusion.py`	Added tests for model reference resolution (directory vs HF repo fallback) and for filtering generation kwargs against pipeline signatures.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Trainer as Training Pipeline
    participant Loader as ModelConfig Loader
    participant ZPipe as ZImagePipeline
    participant Tokenizer as Z-Image Tokenizer
    participant Loss as FlowMatchSFTLoss
    participant Saver as LoRA Saver

    User->>Trainer: start Z-Image training
    Trainer->>Loader: build_zimage_model_configs(model_path)
    Loader-->>Trainer: model & tokenizer configs
    Trainer->>ZPipe: instantiate pipeline (device / dtype / freeze parts)
    ZPipe-->>Trainer: pipeline ready

    Trainer->>Tokenizer: encode_prompt_zimage(prompts)
    Tokenizer-->>Trainer: prompt embeddings

    Trainer->>Loss: forward(batch, embeddings, sizes/crops)
    Loss-->>Trainer: loss
    Trainer->>Saver: save LoRA (safetensors or fallback)
    Saver-->>User: checkpoint saved

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

"I nibble on tokens in moonlit code,
Z-Image branches where new paths go,
prompts curled, LoRA threads entwined,
checkpoints saved, configs aligned,
a rabbit hops — the model grows." 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add Z Image LoRA fine tuning support' accurately and directly describes the primary purpose of the PR—introducing Z Image LoRA fine-tuning capabilities across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add/z-image-ft

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

api/transformerlab/plugins/diffusion_trainer/main.py (1)

570-586: Remove duplicate VAE xFormers enablement.

The VAE call is executed twice for non‑ZImage paths. Keep a single guarded call.

♻️ Suggested cleanup

-            if hasattr(vae, "enable_xformers_memory_efficient_attention"):
-                vae.enable_xformers_memory_efficient_attention()
-            if not is_zimage and hasattr(vae, "enable_xformers_memory_efficient_attention"):
+            if not is_zimage and hasattr(vae, "enable_xformers_memory_efficient_attention"):
                 vae.enable_xformers_memory_efficient_attention()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 570 - 586,
The VAE's enable_xformers_memory_efficient_attention is being called twice in
the xFormers enable block; remove the duplicate call so the code only invokes
vae.enable_xformers_memory_efficient_attention() once and guard it with
hasattr(vae, "enable_xformers_memory_efficient_attention") and the is_zimage
check as appropriate (use unet.enable_xformers_memory_efficient_attention() and
a single conditional call to vae.enable_xformers_memory_efficient_attention()
when available and when not is_zimage).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/transformerlab/plugins/diffusion_trainer/main.py`:
- Around line 62-95: In build_zimage_model_configs, validate the local-path glob
results (transformer_paths, text_encoder_paths, vae_paths, and tokenizer path)
when model_id_or_path is a directory: if any of these lists/paths are empty or
missing, raise a clear ValueError describing which component is missing instead
of letting downstream opaque errors occur; keep using ModelConfig for each
component but fail fast with an explicit message naming the missing asset (e.g.,
"missing transformer files", "missing text_encoder files", "missing vae file",
or "missing tokenizer directory") so callers can immediately act.
- Around line 453-483: When defaulting Z-Image to BF16 in the mixed-precision
logic, guard that choice with an actual hardware BF16 support check: inside the
block that sets weight_dtype based on is_zimage and mixed_precision (referencing
is_zimage, mixed_precision, weight_dtype, device), only set weight_dtype =
torch.bfloat16 if CUDA is available and torch.cuda.is_bf16_supported() returns
True; otherwise fall back to torch.float32 (or respect explicit "bf16" request).
Ensure the check runs before assigning weight_dtype so non-BF16 GPUs/CPUs won't
get bfloat16 by default.
- Around line 491-553: The build currently relies on diffsynth APIs used around
ZImagePipeline.from_pretrained and pipe.scheduler.set_timesteps (seen in
main.py), but setup.sh installs diffsynth without a version pin; update setup.sh
to pin diffsynth to a compatible minimum/locked version (e.g., change the
install spec to diffsynth>=0.X.Y or a specific tested release) so the pipeline
code (ZImagePipeline.from_pretrained, scheduler.set_timesteps, and related
behavior) remains stable across environments.
- Around line 1005-1018: The code references an unreleased class
FlowMatchSFTLoss (imported from diffsynth.diffusion.loss) which isn't available
in public diffsynth v2.0.4; update the codebase and dependency declarations:
either replace FlowMatchSFTLoss usage in main.py (around the is_zimage branch
where input_latents, prompt_embeds, vae_encoder, and encode_prompt_zimage are
used) with a public, supported loss class or vendor the missing implementation,
and then pin and document the exact diffsynth fork/commit or custom package in
requirements.txt or pyproject.toml; also ensure the replacement/venor returns a
PyTorch tensor (compatible with the .item() call) and preserve the
gradient_checkpointing flags (use_gradient_checkpointing and
use_gradient_checkpointing_offload) so runtime behavior remains consistent.

In `@api/transformerlab/plugins/diffusion_trainer/setup.sh`:
- Around line 3-7: Update the PEFT requirement from "peft>=0.15.0" to
"peft>=0.17.0" in the shell install line (replace the existing uv pip install
"peft>=0.15.0" diffsynth command with uv pip install "peft>=0.17.0" diffsynth)
and also adjust the PEFT version constraint in the project-level pyproject.toml
optional dependencies entries so they no longer pin to 0.14.0/0.15.2 but allow
>=0.17.0, ensuring consistency with diffusers 0.36.0 and other diffusion
plugins.

---

Nitpick comments:
In `@api/transformerlab/plugins/diffusion_trainer/main.py`:
- Around line 570-586: The VAE's enable_xformers_memory_efficient_attention is
being called twice in the xFormers enable block; remove the duplicate call so
the code only invokes vae.enable_xformers_memory_efficient_attention() once and
guard it with hasattr(vae, "enable_xformers_memory_efficient_attention") and the
is_zimage check as appropriate (use
unet.enable_xformers_memory_efficient_attention() and a single conditional call
to vae.enable_xformers_memory_efficient_attention() when available and when not
is_zimage).

api/transformerlab/plugins/diffusion_trainer/main.py

coderabbitai · 2026-02-18T08:32:09Z

api/transformerlab/plugins/diffusion_trainer/main.py

+    pipe = None
+    if is_zimage:
+        # Ensure the model is downloaded locally if it's not already a directory
+        if not os.path.isdir(pretrained_model_name_or_path):
+            from huggingface_hub import snapshot_download
+
+            print(f"Downloading Z-Image model {pretrained_model_name_or_path} from Hugging Face...")
+            pretrained_model_name_or_path = snapshot_download(
+                repo_id=pretrained_model_name_or_path,
+                allow_patterns=["*.safetensors", "*.json", "tokenizer/*"],
+            )
+            print(f"Model downloaded to: {pretrained_model_name_or_path}")

-    # Extract components from the loaded pipeline
-    noise_scheduler = temp_pipeline.scheduler
-    tokenizer = temp_pipeline.tokenizer
-    text_encoder = temp_pipeline.text_encoder
-    vae = temp_pipeline.vae
+        model_configs, tokenizer_config = build_zimage_model_configs(pretrained_model_name_or_path)
+        pipe = ZImagePipeline.from_pretrained(
+            torch_dtype=weight_dtype,
+            device=device,
+            model_configs=model_configs,
+            tokenizer_config=tokenizer_config,
+        )

-    # Handle different architectures: FluxPipeline uses 'transformer', others use 'unet'
-    # We use 'unet' as a unified variable name for the main model component regardless of architecture
-    if hasattr(temp_pipeline, "transformer"):
-        # FluxPipeline and other transformer-based models
-        unet = temp_pipeline.transformer
-        model_component_name = "transformer"
+        pipe.scheduler.set_timesteps(int(args.get("num_train_timesteps", 1000)), training=True)
+        noise_scheduler = pipe.scheduler
+        tokenizer = pipe.tokenizer
+        text_encoder = pipe.text_encoder
+        vae_encoder = pipe.vae_encoder
+        vae_decoder = pipe.vae_decoder
+        unet = pipe.dit
+        model_component_name = "dit"
+        text_encoder_2 = None
+        tokenizer_2 = None
+        vae = None
    else:
-        # SD 1.x, SDXL, SD3 and other UNet-based models
-        unet = temp_pipeline.unet
-        model_component_name = "unet"
+        temp_pipeline = AutoPipelineForText2Image.from_pretrained(pretrained_model_name_or_path, **pipeline_kwargs)
+
+        # Extract components from the loaded pipeline
+        noise_scheduler = temp_pipeline.scheduler
+        tokenizer = temp_pipeline.tokenizer
+        text_encoder = temp_pipeline.text_encoder
+        vae = temp_pipeline.vae
+
+        # Handle different architectures: FluxPipeline uses 'transformer', others use 'unet'
+        # We use 'unet' as a unified variable name for the main model component regardless of architecture
+        if hasattr(temp_pipeline, "transformer"):
+            # FluxPipeline and other transformer-based models
+            unet = temp_pipeline.transformer
+            model_component_name = "transformer"
+        else:
+            # SD 1.x, SDXL, SD3 and other UNet-based models
+            unet = temp_pipeline.unet
+            model_component_name = "unet"

-    # Handle SDXL case with dual text encoders
-    text_encoder_2 = getattr(temp_pipeline, "text_encoder_2", None)
-    tokenizer_2 = getattr(temp_pipeline, "tokenizer_2", None)
+        # Handle SDXL case with dual text encoders
+        text_encoder_2 = getattr(temp_pipeline, "text_encoder_2", None)
+        tokenizer_2 = getattr(temp_pipeline, "tokenizer_2", None)

-    # Clean up temporary pipeline
-    del temp_pipeline
-    torch.cuda.empty_cache() if torch.cuda.is_available() else None
+        # Clean up temporary pipeline
+        del temp_pipeline
+        torch.cuda.empty_cache() if torch.cuda.is_available() else None

-    print(f"Model components loaded successfully: {pretrained_model_name_or_path}")
-    print(f"Architecture detected - Model component ({model_component_name}): {type(unet).__name__}")
+        print(f"Model components loaded successfully: {pretrained_model_name_or_path}")
+        print(f"Architecture detected - Model component ({model_component_name}): {type(unet).__name__}")
    if text_encoder_2 is not None:


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find dependency files to see diffsynth version pinning find . -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "poetry.lock" -o -name "Pipfile" \) | head -20

Repository: transformerlab/transformerlab-app

Length of output: 216

🏁 Script executed:

# Search for diffsynth imports in the codebase to understand usage rg "diffsynth" --type py -i | head -30

Repository: transformerlab/transformerlab-app

Length of output: 378

🏁 Script executed:

# Look for the specific file mentioned in the review head -50 api/transformerlab/plugins/diffusion_trainer/main.py

Repository: transformerlab/transformerlab-app

Length of output: 1472

🏁 Script executed:

# Check the pyproject.toml for diffsynth version cat api/pyproject.toml | grep -A 50 "dependencies"

Repository: transformerlab/transformerlab-app

Length of output: 2066

🌐 Web query:

diffsynth ZImagePipeline from_pretrained API documentation

💡 Result:

diffsynth.pipelines.z_image.ZImagePipeline.from_pretrained is defined in the DiffSynth-Studio repo here: diffsynth/pipelines/z_image.py. It has this signature (as of main):

from_pretrained(torch_dtype=torch.bfloat16, device=get_device_type(), model_configs: list[ModelConfig]=[], tokenizer_config: ModelConfig=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"), vram_limit: float=None, enable_npu_patch: bool=True) [1]

Parameter meaning (project docs/examples):

torch_dtype: compute dtype used by the pipeline (not necessarily how weights are stored/offloaded). [2]

device: pipeline execution device (e.g. "cuda"). [1]

model_configs: list of ModelConfig entries describing what to download/load (by model_id + origin_file_pattern, or local path, and optionally VRAM/offload settings). [2][3]

tokenizer_config: ModelConfig pointing to the tokenizer folder (default is Z-Image-Turbo tokenizer). [1][3]

vram_limit: enables VRAM-management behavior during download_and_load_models(...) (used in official examples as “available VRAM (GiB) minus some margin”). [1][3]

enable_npu_patch: applies an NPU compatibility patch during init. [1]

Example usage (Z-Image Turbo quick start): uses model_configs=[ModelConfig(... transformer ...), ModelConfig(... text_encoder ...), ModelConfig(... vae ...)] plus tokenizer_config=ModelConfig(..., origin_file_pattern="tokenizer/") and an optional vram_limit=.... [3]

Related docs pointer: the general “Loading Models” section explains from_pretrained, ModelConfig, remote vs local loading, and env vars like DIFFSYNTH_MODEL_BASE_PATH / DIFFSYNTH_SKIP_DOWNLOAD. [2]

Sources: [1] [2] [3]

🌐 Web query:

diffsynth scheduler set_timesteps training parameter

💡 Result:

In DiffSynth(-Studio) the meaning of the training parameter depends on which scheduler you’re using:

FlowMatchScheduler (diffsynth/schedulers/flow_match.py)

set_timesteps is defined as:

set_timesteps(num_inference_steps=100, denoising_strength=1.0, training=False, shift=None)

training=False (default): only builds self.sigmas and self.timesteps.

training=True: additionally computes self.linear_timesteps_weights, which is later consumed by training_weight(timestep) for per-timestep loss weighting. (huggingface.co)

So training is not “how many training timesteps”—it’s a flag to enable training-time timestep weights.

Typical usage for training with FlowMatch:

Call scheduler.set_timesteps(N, training=True) where N is the number of discrete timesteps you want to sample from (often N=1000 to match the model’s num_train_timesteps scale).

Sample an index/value from scheduler.timesteps, use add_noise(...), and optionally weight the loss via scheduler.training_weight(timestep). (huggingface.co)

EnhancedDDIMScheduler (DDIM-style training scripts)

Some DiffSynth training code resets the scheduler and calls:

scheduler.set_timesteps(1000)

Here, there is no training= flag involved; it’s just setting the timestep grid length (often to 1000). (gitee.com)

Citations:

1: https://huggingface.co/spaces/svjack/FantasyTalking/blob/main/diffsynth/schedulers/flow_match.py

2: https://huggingface.co/spaces/svjack/FantasyTalking/blob/main/diffsynth/schedulers/flow_match.py

3: https://gitee.com/mirrors/diffsynth-studio/blob/ExVideo/ExVideo_animatediff_train.py?utm_source=openai

🏁 Script executed:

# Check if diffsynth is listed in a separate requirements file for the plugin find api/transformerlab/plugins/diffusion_trainer -name "*.txt" -o -name "*.toml" | xargs cat 2>/dev/null

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check for any setup files or configuration in the diffusion_trainer directory ls -la api/transformerlab/plugins/diffusion_trainer/

Repository: transformerlab/transformerlab-app

Length of output: 432

🏁 Script executed:

# Search the entire codebase for where diffsynth dependency might be specified rg "diffsynth" --type toml --type txt

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check what's in setup.sh - this likely manages plugin dependencies cat api/transformerlab/plugins/diffusion_trainer/setup.sh

Repository: transformerlab/transformerlab-app

Length of output: 398

🏁 Script executed:

# Also check if there's any documentation about how plugin dependencies are managed find . -name "setup.sh" -exec cat {} \; | head -100

Repository: transformerlab/transformerlab-app

Length of output: 4920

Pin diffsynth version to ensure API compatibility across environments.

The code correctly uses ZImagePipeline.from_pretrained(device=..., model_configs=..., tokenizer_config=...) and scheduler.set_timesteps(..., training=True), which are valid in current diffsynth versions. However, setup.sh installs diffsynth without a version constraint, whereas peft is pinned to >=0.15.0. Without version pinning, future diffsynth releases could introduce breaking API changes. Add a version constraint (e.g., "diffsynth>=0.X.Y") to setup.sh to guarantee consistent behavior.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 491 - 553, The build currently relies on diffsynth APIs used around ZImagePipeline.from_pretrained and pipe.scheduler.set_timesteps (seen in main.py), but setup.sh installs diffsynth without a version pin; update setup.sh to pin diffsynth to a compatible minimum/locked version (e.g., change the install spec to diffsynth>=0.X.Y or a specific tested release) so the pipeline code (ZImagePipeline.from_pretrained, scheduler.set_timesteps, and related behavior) remains stable across environments.

coderabbitai · 2026-02-18T08:32:09Z

api/transformerlab/plugins/diffusion_trainer/main.py

+            if is_zimage:
+                pixel_values = batch["pixel_values"].to(device, dtype=weight_dtype)
+                input_latents = vae_encoder(pixel_values)
+                prompt_embeds = encode_prompt_zimage(pipe, batch["prompt"], device)
+
+                loss = FlowMatchSFTLoss(
+                    pipe,
+                    input_latents=input_latents,
+                    prompt_embeds=prompt_embeds,
+                    image_embeds=None,
+                    image_latents=None,
+                    use_gradient_checkpointing=args.get("gradient_checkpointing", False),
+                    use_gradient_checkpointing_offload=False,
                )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find where FlowMatchSFTLoss is defined or imported rg "FlowMatchSFTLoss" --type py -B 2 -A 2

Repository: transformerlab/transformerlab-app

Length of output: 975

🏁 Script executed:

# Also check the specific file to see imports head -100 api/transformerlab/plugins/diffusion_trainer/main.py | grep -E "^import|^from"

Repository: transformerlab/transformerlab-app

Length of output: 895

🏁 Script executed:

# Check if it's defined in the file itself rg "class FlowMatchSFTLoss" --type py -A 20

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check if diffsynth is vendored or installed find . -name "*diffsynth*" -type d | head -20

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check requirements files for diffsynth version find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" | xargs grep -l "diffsynth" 2>/dev/null

Repository: transformerlab/transformerlab-app

Length of output: 61

🏁 Script executed:

# Search for FlowMatchSFTLoss in the entire repo rg "class FlowMatchSFTLoss" --type py

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check for any diffsynth submodules or vendored code ls -la | grep -i diffsynth || echo "No top-level diffsynth directory"

Repository: transformerlab/transformerlab-app

Length of output: 109

🌐 Web query:

diffsynth FlowMatchSFTLoss class signature parameters return type

💡 Result:

I couldn’t find any public FlowMatchSFTLoss class in the searchable DiffSynth-Studio sources/docs or the PyPI diffsynth project page (v2.0.4, released Jan 30, 2026). The only clearly indexed “FlowMatch*” API I could locate is FlowMatchScheduler (scheduler implementation), not an SFT loss class. [1], [2]

Because FlowMatchSFTLoss doesn’t appear in public docs/search results, I can’t reliably state its class signature, __init__ parameters, or forward/return type from web sources.

How to get the exact signature from your installed diffsynth:

import inspect from diffsynth import FlowMatchSFTLoss # or its real import path if different print(FlowMatchSFTLoss) print(inspect.signature(FlowMatchSFTLoss.__init__)) print(inspect.signature(FlowMatchSFTLoss.forward)) print(inspect.getsource(FlowMatchSFTLoss.forward))

If you paste the import path you’re using (or the traceback showing where FlowMatchSFTLoss is defined), I can tell you the exact parameters and return type from that definition.

Sources:
[1] PyPI project page for diffsynth (v2.0.4, Jan 30, 2026) (pypi.org)
[2] DiffSynth-Studio docs mentioning FlowMatchScheduler (diffsynth-studio.readthedocs.io)

Citations:

1: https://pypi.org/project/diffsynth/

2: https://diffsynth-studio.readthedocs.io/zh-cn/latest/tutorial/Schedulers.html?utm_source=openai

🏁 Script executed:

# Check if there's a local diffsynth implementation in the codebase rg "from diffsynth.diffusion" --type py -l

Repository: transformerlab/transformerlab-app

Length of output: 129

🏁 Script executed:

# Search for any loss-related classes defined locally rg "class.*Loss" --type py | grep -i "flowmatch\|sft"

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check if diffsynth.diffusion.loss is implemented anywhere locally find . -path "*/diffsynth/diffusion/loss*" -type f

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check the git history to see if this was recently added git log --oneline --all -- "*FlowMatchSFTLoss*" 2>/dev/null | head -10

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check what happens with the 'loss' variable after it's created sed -n '1005,1050p' api/transformerlab/plugins/diffusion_trainer/main.py

Repository: transformerlab/transformerlab-app

Length of output: 2444

🏁 Script executed:

# Check for requirements to see diffsynth version constraint find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" | xargs cat 2>/dev/null | grep -i diffsynth

Repository: transformerlab/transformerlab-app

Length of output: 59

🏁 Script executed:

# Check if there's a git submodule or fork cat .gitmodules 2>/dev/null | grep -i diffsynth || echo "No submodule found"

Repository: transformerlab/transformerlab-app

Length of output: 95

🏁 Script executed:

# Check for any notes or documentation about diffsynth version find . -name "*.md" -o -name "*.txt" | xargs grep -l "FlowMatchSFTLoss\|diffsynth.*version" 2>/dev/null | head -5

Repository: transformerlab/transformerlab-app

Length of output: 59

FlowMatchSFTLoss is not found in any public diffsynth version—verify dependency.

FlowMatchSFTLoss does not exist in the publicly available diffsynth library (v2.0.4, latest on PyPI). The code imports from diffsynth.diffusion.loss, but this class is undocumented and unreleased. Confirm that your development environment uses a specific diffsynth fork, branch, or unreleased version, and document this dependency in requirements.txt or pyproject.toml to prevent deployment failures. The usage pattern (.item() at line 1019) indicates it should return a PyTorch tensor, but this callable contract cannot be verified without access to the actual implementation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 1005 - 1018, The code references an unreleased class FlowMatchSFTLoss (imported from diffsynth.diffusion.loss) which isn't available in public diffsynth v2.0.4; update the codebase and dependency declarations: either replace FlowMatchSFTLoss usage in main.py (around the is_zimage branch where input_latents, prompt_embeds, vae_encoder, and encode_prompt_zimage are used) with a public, supported loss class or vendor the missing implementation, and then pin and document the exact diffsynth fork/commit or custom package in requirements.txt or pyproject.toml; also ensure the replacement/venor returns a PyTorch tensor (compatible with the .item() call) and preserve the gradient_checkpointing flags (use_gradient_checkpointing and use_gradient_checkpointing_offload) so runtime behavior remains consistent.

coderabbitai · 2026-02-18T08:32:09Z

api/transformerlab/plugins/diffusion_trainer/setup.sh

+# Install compatible torch and torchvision first to avoid version conflicts
+uv pip install torch torchvision diffusers transformers --extra-index-url https://download.pytorch.org/whl/cu118
+
+# Install PEFT and diffsynth
+uv pip install "peft>=0.15.0" diffsynth


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

torch diffusers peft transformers compatibility requirements 2025

💡 Result:

Practical compatibility requirements (late‑2025 / 2026-era HF stack)

Diffusers

diffusers v0.36.0 is tested on Python 3.8+ and PyTorch 1.4+ and the docs explicitly install it alongside transformers (e.g. diffusers["torch"] transformers). [1]

Diffusers ↔ PEFT (LoRA/adapters)

If you install Diffusers from git / dev builds (e.g., 0.36.0.dev0), Diffusers can enforce peft>=0.17.0; a real-world failure report shows an import error when peft==0.7.0 is present and Diffusers requires peft>=0.17.0. [2]

Transformers ↔ PEFT

Transformers v5 is incompatible with peft<0.18.0; if you plan to use Transformers v5, use peft>=0.18.0. [3]

PEFT 0.18.0+ also requires Python 3.10+ (drops 3.9). [3]

Transformers ↔ huggingface-hub (common resolver trap)

Recent transformers (example given: 4.57.3, as of 2026‑01‑05) still requires huggingface-hub<1.0; letting your environment pick huggingface-hub==1.x can force downgrades/breakage. [4]

A “safe” 2025-ish pinning strategy (what the above implies)

If staying on Transformers 4.x: use diffusers>=0.36, transformers~=4.5x, peft>=0.17, and ensure huggingface-hub<1.0. [1][2][4]

If moving to Transformers 5.x: use peft>=0.18.0 (and therefore Python 3.10+), plus whatever Diffusers release notes recommend for that Diffusers version. [3]

Sources: Diffusers installation docs v0.36.0 [1]; Diffusers issue documenting peft>=0.17.0 requirement on dev installs [2]; PEFT 0.18.0 release notes on Transformers v5 + Python requirement [3]; dependency note on transformers requiring huggingface-hub<1.0 [4].

🏁 Script executed:

# Search for requirements files and dependency specifications in the project find . -type f \( -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "constraints*.txt" \) | head -20

Repository: transformerlab/transformerlab-app

Length of output: 216

🏁 Script executed:

# Check the setup.sh file and surrounding context cat -n api/transformerlab/plugins/diffusion_trainer/setup.sh

Repository: transformerlab/transformerlab-app

Length of output: 482

🏁 Script executed:

# Look for other setup or installation documentation find . -type f \( -name "*.md" -o -name "*.txt" \) | xargs grep -l "torch\|diffusers\|peft" 2>/dev/null | head -10

Repository: transformerlab/transformerlab-app

Length of output: 302

🏁 Script executed:

# Check the actual pyproject.toml for dependency management cat -n api/pyproject.toml | head -100

Repository: transformerlab/transformerlab-app

Length of output: 3219

🏁 Script executed:

# Search for torch/base plugin setup references rg -n "torch|base.plugin|base_plugin|plugin.*venv" --type py --type toml --type sh api/

Repository: transformerlab/transformerlab-app

Length of output: 50389

🏁 Script executed:

# Check if there's a parent/base setup script find api/transformerlab/plugins -name "setup.sh" -o -name "base*.sh" | xargs cat -n

Repository: transformerlab/transformerlab-app

Length of output: 19659

🏁 Script executed:

# Check if other plugins use version pinning or rely on project levels rg -A2 "uv pip install" api/transformerlab/plugins/*/setup.sh | grep -E "(peft|diffusers|transformers)" | head -30

Repository: transformerlab/transformerlab-app

Length of output: 2168

🏁 Script executed:

# Verify the actual pyproject.toml peft versions for all platforms grep -A5 "peft==" api/pyproject.toml

Repository: transformerlab/transformerlab-app

Length of output: 529

Increase peft version constraint to >=0.17.0 for compatibility with diffusers 0.36.0.

The specified peft>=0.15.0 is below the minimum required by diffusers 0.36.0 (which requires peft>=0.17.0 per Hugging Face release notes). This conflicts with other diffusion-based plugins in the codebase (llama_trainer, image_diffusion, fastchat_server) which correctly use peft>=0.17.0. Additionally, the project-level pyproject.toml pins peft to 0.14.0 (nvidia/cpu) and 0.15.2 (rocm), which will force a downgrade and create a version mismatch at runtime.

Suggested fix

uv pip install "peft>=0.17.0" diffsynth

Verify this is also addressed in the project-level pyproject.toml optional dependencies.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/plugins/diffusion_trainer/setup.sh` around lines 3 - 7, Update the PEFT requirement from "peft>=0.15.0" to "peft>=0.17.0" in the shell install line (replace the existing uv pip install "peft>=0.15.0" diffsynth command with uv pip install "peft>=0.17.0" diffsynth) and also adjust the PEFT version constraint in the project-level pyproject.toml optional dependencies entries so they no longer pin to 0.14.0/0.15.2 but allow >=0.17.0, ensuring consistency with diffusers 0.36.0 and other diffusion plugins.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/transformerlab/plugin_sdk/plugin_harness.py`:
- Around line 110-119: The parameter target_env_var in set_config_env_vars is
declared Optional but later used as a dict key in os.environ[target_env_var],
which will raise if None; update set_config_env_vars to either require
target_env_var (change its type to str) or add a guard at the top of the
function (e.g., raise ValueError or return early if target_env_var is None)
before calling get_db_config_value and setting os.environ, referencing the
function name set_config_env_vars and the os.environ assignment to locate and
fix the issue.

api/transformerlab/plugin_sdk/plugin_harness.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/transformerlab/plugins/diffusion_trainer/setup.sh`:
- Around line 1-9: The install script exits successfully even if the core
dependency install fails; update the shell script (setup.sh) to fail fast by
enabling strict error handling (add set -euo pipefail at the top) or by
appending explicit failure checks to the uv pip install invocation (e.g., ensure
the "uv pip install" command in the script will exit non‑zero on failure and
propagate that by using || exit 1). Target the top of the script and the "uv pip
install" line to ensure dependency install failures surface immediately.

api/transformerlab/plugins/diffusion_trainer/setup.sh

ParamThakkar123 · 2026-02-18T08:56:50Z

I fixed this and tested on my azure vm

coderabbitai

🧹 Nitpick comments (1)

api/transformerlab/plugins/image_diffusion/main.py (1)

4-5: Consider centralizing model-reference helpers to avoid drift.
These utilities mirror the ones in api/transformerlab/plugins/image_diffusion/diffusion_worker.py; extracting them into a shared module would reduce duplication and keep behavior consistent.

Also applies to: 327-474

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugins/image_diffusion/main.py` around lines 4 - 5, The
model-reference helper functions duplicated between image_diffusion main and
diffusion_worker should be extracted into a single shared module (e.g.,
model_reference_helpers) and both modules should import those helpers instead of
keeping separate copies; locate the duplicated utilities in
image_diffusion/plugins/image_diffusion/main.py and
image_diffusion/plugins/image_diffusion/diffusion_worker.py, move the helper
definitions into the new shared module, update both files to import the helpers
(removing the local copies and any redundant imports like inspect/Path if no
longer needed), and run/adjust any unit tests or usage sites to ensure the
unified helpers' API matches prior behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@api/transformerlab/plugins/image_diffusion/diffusion_worker.py`:
- Around line 12-221: The helper utilities in diffusion_worker.py (functions
_is_probable_hf_repo_id, _extract_hf_repo_from_model_metadata,
resolve_diffusion_model_reference, filter_generation_kwargs_for_pipeline)
duplicate logic from main.py; refactor by extracting these helpers into a single
shared module (e.g., a utilities or helpers module) and replace the local
implementations with imports from that module, updating any local references
(including imports used inside _extract_hf_repo_from_model_metadata such as
ModelService/asyncio) to use the centralized implementation so there is a single
source of truth and no duplicated code.

---

Nitpick comments:
In `@api/transformerlab/plugins/image_diffusion/main.py`:
- Around line 4-5: The model-reference helper functions duplicated between
image_diffusion main and diffusion_worker should be extracted into a single
shared module (e.g., model_reference_helpers) and both modules should import
those helpers instead of keeping separate copies; locate the duplicated
utilities in image_diffusion/plugins/image_diffusion/main.py and
image_diffusion/plugins/image_diffusion/diffusion_worker.py, move the helper
definitions into the new shared module, update both files to import the helpers
(removing the local copies and any redundant imports like inspect/Path if no
longer needed), and run/adjust any unit tests or usage sites to ensure the
unified helpers' API matches prior behavior.

dadmobile · 2026-02-18T13:46:59Z

api/transformerlab/plugins/diffusion_trainer/setup.sh

+# This avoids resolver-driven torch/torchvision drift (e.g. missing torchvision::nms).
+uv pip install \
+    "diffusers==0.36.0" \
+    "transformers==4.57.1" \


This is the standard diffusers and transformers version we are using. You shouldn't have to put this in here?
Or are you saying that you want to upgrade peft but doing so changes diffusers and transformers version? If so, do we just upgrade peft in pyproject.toml?
Or is it because of diffsynth?

api/transformerlab/plugin_sdk/plugin_harness.py

coderabbitai

🧹 Nitpick comments (1)

api/transformerlab/plugin_sdk/plugin_harness.py (1)

110-123: Log message doesn't distinguish global config from team config.

Line 121 prints "user" or "team", but when both user_id and team_id are None, the config is actually sourced from global scope. Consider updating for accuracy:

-            print(f"Set {target_key} from {'user' if user_id else 'team'} config")
+            source = "user" if user_id else ("team" if team_id else "global")
+            print(f"Set {target_key} from {source} config")

Similarly for line 123.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugin_sdk/plugin_harness.py` around lines 110 - 123, In
set_config_env_vars, the log messages use {'user' if user_id else 'team'} which
is incorrect when both user_id and team_id are None (global config); compute a
source string like source = 'user' if user_id else 'team' if team_id else
'global' and use that source variable in both the success print (after setting
os.environ[target_key]) and the exception warning so logs correctly show 'user',
'team', or 'global'; reference set_config_env_vars, target_key,
get_db_config_value and os.environ when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/transformerlab/plugin_sdk/plugin_harness.py`:
- Around line 110-123: In set_config_env_vars, the log messages use {'user' if
user_id else 'team'} which is incorrect when both user_id and team_id are None
(global config); compute a source string like source = 'user' if user_id else
'team' if team_id else 'global' and use that source variable in both the success
print (after setting os.environ[target_key]) and the exception warning so logs
correctly show 'user', 'team', or 'global'; reference set_config_env_vars,
target_key, get_db_config_value and os.environ when making the change.

dadmobile · 2026-02-18T14:46:32Z

Diffusion works for me but training doesn't. I'm getting this error about FA3 in xformers for some reason. Let's make a decision at scrum

ImportError: /home/azureuser/.transformerlab/orgs/3c33c85b-628a-4ca8-93d3-b657cb7973b2/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/xformers/flash_attn_3/_C.so: undefined symbol: torch_list_push_back

…ansformerlab-app into add/z-image-ft

…ab-app into add/z-image-ft

dadmobile

I'm comfortable with this and will let @deep1401 make the call on what to do next.

deep1401

Overall, I don't think this is usable for anyone with a smaller gpu. It crashes for me on a dataset of 5 images with 24gb VRAM. I was able to make FLUX run with sharding but instead of sharding, the easiest thing to do here is look at the VRAM management section here along with other recommendations: https://github.com/modelscope/DiffSynth-Studio/blob/main/docs/en/Model_Details/Z-Image.md

I would recommend we don't do any changes on this anymore. Lets close this and remake everything such that it just clones DiffSynth studio and executes their scripts directly. We can make a new PR after we move to local providers?

Tagging @dadmobile here for further opinions

deep1401 · 2026-02-18T16:59:51Z

api/transformerlab/plugin_sdk/plugin_harness.py

+from typing import Optional
+
+
+def get_db_config_value(key: str, team_id: Optional[str] = None, user_id: Optional[str] = None) -> Optional[str]:


Why did you remove the import from transformerlab.plugin and add the function here directly? Was there an issue?

deep1401 · 2026-02-18T17:00:41Z

api/transformerlab/plugin_sdk/plugin_harness.py

+    if "ncclCommShrink" in str(e):
+        print(
+            "Detected CUDA/NCCL mismatch while importing torch. "
+            "Reinstall the plugin venv with a torch build matching this machine's CUDA runtime."


We should never face this issue since we do the base install right?

deep1401 · 2026-02-18T17:05:57Z

api/transformerlab/plugins/image_diffusion/diffusion_worker.py

+    return None
+
+
+def resolve_diffusion_model_reference(model: str) -> str:


I dont think we need/support this right now? diffusers itself requires model_index.json

deep1401 · 2026-02-18T17:06:53Z

api/transformerlab/plugins/image_diffusion/main.py

    # cache_key = get_pipeline_key(model, adaptor, is_img2img, is_inpainting)

    with _PIPELINES_LOCK:
+        resolved_model = resolve_diffusion_model_reference(model)


We wouldnt need resolving right because the plugin_harness provides all info correctly and you wouldnt reach this stage if something was unresolved

dadmobile · 2026-02-18T19:09:42Z

OK agreed. @ParamThakkar123 I know you did a tonne on this PR but let's take what we did from this and instead focus on making a task with DiffSynth on the new style tasks.

Add Z Image LoRA fine tuning support

a25e3a1

ParamThakkar123 added 2 commits December 23, 2025 12:20

Added existing parameter loading

5c15019

Updates

2b7c286

ParamThakkar123 and others added 2 commits December 24, 2025 11:47

Merge branch 'main' of https://github.com/transformerlab/transformerl…

038552a

…ab-app into add/z-image-ft

Merge branch 'main' into add/z-image-ft

d571ab8

dadmobile added 2 commits January 9, 2026 17:03

Merge branch 'main' into add/z-image-ft

e334552

Merge branch 'main' into add/z-image-ft

222a669

dadmobile requested changes Jan 20, 2026

View reviewed changes

dadmobile mentioned this pull request Jan 21, 2026

Feature: Support fine-tuning for Z-Image architecture #1120

Open

Merge branch 'main' into add/z-image-ft

ddd7319

deep1401 requested changes Feb 2, 2026

View reviewed changes

api/transformerlab/plugins/diffusion_trainer/index.json Outdated Show resolved Hide resolved

ParamThakkar123 and others added 16 commits February 5, 2026 01:24

Updated ZImage fine tuning code

efada1f

Merge branch 'add/z-image-ft' of https://github.com/transformerlab/tr…

2f2f9ee

…ansformerlab-app into add/z-image-ft

Updated ZImage fine tuning code

5100330

Updated ZImage fine tuning code

ef967b0

Merge branch 'main' of https://github.com/transformerlab/transformerl…

e3262d2

…ab-app into add/z-image-ft

Reformat and rebase

19f23a6

Updates

cc2a21d

Updates

5c81506

Updates

3c6c374

Merge branch 'main' of https://github.com/transformerlab/transformerl…

d6f2822

…ab-app into add/z-image-ft

Updates

99e483e

Fixed saving lora weights

a2f85e9

Formatting

d73e250

Merge branch 'main' of https://github.com/transformerlab/transformerl…

c1044c7

…ab-app into add/z-image-ft

ruff

63160e3

Merge branch 'main' into add/z-image-ft

4c9ec6c

Merge branch 'add/z-image-ft' of https://github.com/transformerlab/tr…

fae56f9

…ansformerlab-app into add/z-image-ft

Fixes

1026537

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

Fixes

0eacdb5

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

api/transformerlab/plugin_sdk/plugin_harness.py Outdated Show resolved Hide resolved

Updated Peft version

78f7d24

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

api/transformerlab/plugins/diffusion_trainer/setup.sh Show resolved Hide resolved

ParamThakkar123 added 2 commits February 18, 2026 16:33

Fixes

69687cf

Fixes

4425840

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

dadmobile requested changes Feb 18, 2026

View reviewed changes

ParamThakkar123 added 2 commits February 18, 2026 19:31

Fixes

ddbc9e0

Unpin versions

2914c12

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

dadmobile and others added 3 commits February 18, 2026 09:32

Bump diffusion plugin version

f1061d5

Merge branch 'main' into add/z-image-ft

9950ffd

Updates

973795c

ParamThakkar123 added 2 commits February 18, 2026 21:06

Updates

53a2895

Merge branch 'add/z-image-ft' of https://github.com/transformerlab/tr…

5a3f72c

…ansformerlab-app into add/z-image-ft

deep1401 added the mode:single-user label Feb 18, 2026

deep1401 and others added 2 commits February 18, 2026 11:44

Merge branch 'main' of https://github.com/transformerlab/transformerl…

87f465d

…ab-app into add/z-image-ft

ruff

edcb473

dadmobile approved these changes Feb 18, 2026

View reviewed changes

Merge branch 'main' into add/z-image-ft

6cc7388

deep1401 requested changes Feb 18, 2026

View reviewed changes

dadmobile closed this Feb 18, 2026

		from typing import Optional


		def get_db_config_value(key: str, team_id: Optional[str] = None, user_id: Optional[str] = None) -> Optional[str]:

		return None


		def resolve_diffusion_model_reference(model: str) -> str:

Uh oh!

Conversation

ParamThakkar123 commented Dec 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

codecov-commenter commented Dec 23, 2025 • edited by sentry bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ParamThakkar123 commented Dec 24, 2025

Uh oh!

deep1401 commented Dec 24, 2025

Uh oh!

dadmobile left a comment

Choose a reason for hiding this comment

Uh oh!

deep1401 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

FlowMatchScheduler (diffsynth/schedulers/flow_match.py)

EnhancedDDIMScheduler (DDIM-style training scripts)

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Practical compatibility requirements (late‑2025 / 2026-era HF stack)

A “safe” 2025-ish pinning strategy (what the above implies)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ParamThakkar123 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dadmobile Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dadmobile commented Feb 18, 2026

Uh oh!

dadmobile left a comment

Choose a reason for hiding this comment

Uh oh!

deep1401 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deep1401 Feb 18, 2026

Choose a reason for hiding this comment

ParamThakkar123 commented Dec 23, 2025 •

edited by coderabbitai bot

Loading

codecov-commenter commented Dec 23, 2025 •

edited by sentry bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

FlowMatchScheduler (`diffsynth/schedulers/flow_match.py`)

ParamThakkar123 commented Feb 18, 2026 •

edited

Loading

deep1401 left a comment •

edited

Loading